Triton 程式設計入門：超越逐點運算：理解歸約模式

雖然 逐點運算 將張量中的每個元素獨立處理， 歸約模式 引入資料依賴關係，其中多個輸入元素被合併為單一輸出值（例如總和、最大值或平均值）。為了高效地實現這些操作，必須彌補資料邏輯上的二維結構與硬體記憶體中線性表示之間的差距。

二維張量在邏輯上是網格結構，但在記憶體中實際上是線性的。了解 列優先（row-major） 與 行優先（column-major） 記憶體布局對於判斷歸約是否會連續存取記憶體位址，還是需要間隔存取至關重要。

一個 矩陣複製 代表一種具有 $1:1$ 輸入到輸出對應關係的逐點運算。相反地，一個歸約是一種多對一（$N:1$）的操作，需要跨執行緒共享累加結果，或在區塊內進行順序處理。

歸約由軸操作所依據的軸而定。沿著軸 1（列）與軸 0（欄）進行歸約，會根本性地改變記憶體步距模式與硬體快取命中率。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

[Short Answer] [Short Answer] matrix copy

A matrix copy is a 1:1 pointwise operation; a reduction is a many-to-one operation requiring data synchronization.

QUESTION 2

Which memory layout is characterized by elements of the same row being stored in adjacent memory addresses?

Column-major

Row-major

Strided-major

Z-order curve

QUESTION 3

If we reduce a tensor of shape (M, N) across axis 1, what is the resulting shape?

(M, 1) or (M,)

(1, N) or (N,)

(1, 1)

(M, N)

QUESTION 4

Why is 'Bias Addition' considered a pointwise operation compared to 'Softmax'?

Bias addition requires every element in a row to be summed first.

Each output element in a bias add depends only on its corresponding input element and a constant.

Bias addition is performed in global memory only.

Softmax does not involve any exponentiation.

QUESTION 5

What is the primary architectural challenge when implementing a reduction in Triton?

Writing the result back to global memory.

Communicating or 'voting' across threads to find a single value (e.g., max).

Using the address-of operator.

Handling floating point addition.